library(dplyr)
library(INLA)
library(ggplot2)
library(patchwork)
library(inlabru)
library(spatstat)
library(sf)
library(scico)
library(spatstat)
library(lubridate)
library(terra)
library(tidyterra)Practical 7
Log-Gaussian Cox Processes
In this practical we will:
- Fit a homogeneous Point Process
- Fit a non-homogeneous Point Process
- Fit a LGCP (log-Gaussian Cox Process)
Libraries to load:
In this practical we consider the data clmfires in the spatstat library.
This dataset is a record of forest fires in the Castilla-La Mancha region of Spain between 1998 and 2007. This region is approximately 400 by 400 kilometres. The coordinates are recorded in kilometres. For more info about the data you can type:
?clmfiresWe first read the data and transform them into an sf object. We also create a polygon that represents the border of the Castilla-La Mancha region. We select the data for year 2004 and only those fires caused by lightning.
data("clmfires")
pp = st_as_sf(as.data.frame(clmfires) %>%
mutate(x = x,
y = y),
coords = c("x","y"),
crs = NA) %>%
filter(cause == "lightning",
year(date) == 2004)
poly = as.data.frame(clmfires$window$bdry[[1]]) %>%
mutate(ID = 1)
region = poly %>%
st_as_sf(coords = c("x", "y"), crs = NA) %>%
dplyr::group_by(ID) %>%
summarise(geometry = st_combine(geometry)) %>%
st_cast("POLYGON")
ggplot() + geom_sf(data = region, alpha = 0) + geom_sf(data = pp) Fit a homogeneous Poisson Process
As a first exercise we are going to fit a homogeneous Poisson process (HPP) to the data. This is a model that assume constant intensity over the whole space so our liner predictor is\[ \eta(s) = \log\lambda(s) = \beta_0 , \ \mathbf{s}\in\Omega \] so the likelihood can be written as: $$ \[\begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) \\ & = \exp\left( -\int_{\Omega}\exp(\beta_0)ds\right)\prod_{i=1}^n \lambda(\mathbf{s}_i) \\ \end{aligned}\]
$$ where \(|\Omega|\) is the area of the domain of interest.
We need to approximate the integral using a numerical integration scheme as: \[ \approx\exp\left(-\sum_{k=1}^{N_k}w_k\lambda(s_k)\right)\prod_{i=1}^n \lambda(\mathbf{s}_i) \] Where \(N_k\) is the number of integration points \(s_1,\dots,s_{N_k}\) and \(w_1,\dots,w_{N_k}\) are the integration weights.
In this case, since the intensity is constant, the integration scheme is really simple: it is enough to consider one random point inside the domain with weight equal to the area of the domain.
# define integration scheme
ips = st_sf(
geometry = st_sample(region, 1)) # some random location inside the domain
ips$weight = st_area(region) # integration weight is the area of the domain
cmp = ~ 0 + beta_0(1)
formula = geometry ~ beta_0
lik = bru_obs(data = pp,
family = "cp",
formula = formula,
ips = ips)
fit1 = bru(cmp, lik)Inhomogeneous Poisson Process
The model above has the clear disadvantages that assumes a constant intensity and from Figure Figure 1 we clearly see that this is not the case.
The library spatstat contains also some covariates that can help explain the fires distribution. Figure @fit-altitude shows the location of fires together with the (scaled) altitude.
elev_raster = rast(clmfires.extra[[2]]$elevation)
elev_raster = scale(elev_raster)
ggplot() + geom_spatraster(data = elev_raster) + geom_sf(data = pp) + scale_fill_scico()We are now going to use the altitude as a covariate to explain the variability of the intensity \(\lambda(s)\) over the domain of interest.
Our model is \[ \log\lambda(s) = \beta_0 + \beta_1x(s) \] where \(x(s)\) is the altitude at location \(s\).
The likelihood becomes: \[ \begin{aligned} p(\mathbf{y} | \lambda) & \propto \exp \left( -\int_\Omega \lambda(\mathbf{s}) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) \\ & = \exp \left( -\int_\Omega \exp(\beta_0 + \beta_1x(s)) \mathrm{d}\mathbf{s} \right) \prod_{i=1}^n \lambda(\mathbf{s}_i) \\ \end{aligned} \] Now we need to choose an integration scheme to solve the integral.
In this case we will take a simple grid based approach where each quadrature location has an equal weight. Our grid consists of \(N_k = 1000\) points and the weights are all equal to \(|\Omega|/N_k\).
n.int = 1000
ips = st_sf(geometry = st_sample(region,
size = n.int,
type = "regular"))
ips$weight = st_area(region) / n.int
ggplot() + geom_sf(data = ips, aes(color = weight)) + geom_sf(data= region, alpha = 0)OBS: The implicit assumption here is that the intensity is constant inside each grid box, and so is the covariate!!
We can now fit the model:
cmp = ~ Intercept(1) + elev(elev_raster, model = "linear")
formula = geometry ~ Intercept + elev
lik = bru_obs(data = pp,
family = "cp",
formula = formula,
ips = ips)
fit2 = bru(cmp, lik)est_grid = st_as_sf(data.frame(crds(elev_raster)), coords = c("x","y"))
est_grid = st_intersection(est_grid, region)
preds2 = predict(fit2, est_grid, ~ data.frame(log_scale = Intercept + elev,
lin_scale = exp(Intercept + elev)))
preds2$lin_scale %>% ggplot() + geom_sf(aes(color = mean)) + scale_color_scico()Lam_samps2 = generate(fit2, ips, ~ sum(weight * exp(elev + Intercept)),
n.samples = 2000)
Lam_df = data.frame(
Lam = as.numeric(Lam_samps2))
ggplot(Lam_df) +
geom_histogram(aes(x = Lam),
colour = "blue",
alpha = 0.5,
bins = 20) +
geom_vline(xintercept = nrow(pp),
colour = "red") +
theme_minimal() +
xlab(expression(Lambda))Log-Gaussian Cox Process
mesh = fm_mesh_2d(boundary = region,
max.edge = c(5, 10),
cutoff = 4, crs = NA)
ggplot() + gg(mesh) + geom_sf(data = pp)spde_model = inla.spde2.pcmatern(mesh,
prior.sigma = c(1, 0.5),
prior.range = c(100, 0.5))
ips = fm_int(mesh, samplers = region)
ggplot() + geom_sf(data = ips, aes(color = weight)) +
gg(mesh) +
scale_color_scico()cmp = ~ Intercept(1) + space(geometry, model = spde_model) + elev(elev_raster, model = "linear")
formula = geometry ~ Intercept + space + elev
lik = bru_obs("cp",
formula = formula,
data = pp,
ips = ips)
fit3 = bru(cmp, lik)
Lam_samps3 = generate(fit3, ips, ~ sum(weight * exp(space + Intercept + elev)),
n.samples = 2000)
Lam_df3 = data.frame(
Lam = as.numeric(Lam_samps3))
ggplot(Lam_df3) +
geom_histogram(aes(x = Lam),
colour = "blue",
alpha = 0.5,
bins = 20) +
geom_vline(xintercept = nrow(pp),
colour = "red") +
theme_minimal() +
xlab(expression(Lambda))